fix: resolve GBK encoding errors on Windows for Chinese content#610
fix: resolve GBK encoding errors on Windows for Chinese content#610Mars-ending wants to merge 1 commit intoOpenBMB:mainfrom
Conversation
## Problem On Windows systems with Chinese locale, Python's stdout uses GBK encoding by default. This causes UnicodeEncodeError when: 1. Model responses contain CJK characters or emoji 2. Logs are written to files via FileHandler without encoding specified Error example: 'gbk' codec can't encode character '\U0001f4d6' in position 1189 ## Changes 1. server_main.py: - Wrap sys.stdout/stderr with UTF-8 TextIOWrapper on Windows - Add encoding='utf-8' to FileHandler for server.log 2. utils/structured_logger.py: - Add encoding='utf-8' to FileHandler for workflow logs 3. utils/logger.py: - Wrap print() in try/except for UnicodeEncodeError fallback - On encoding error, strip problematic characters gracefully ## Testing Verified on Windows 11 with Chinese locale: - Workflow with Chinese task prompts now completes without encoding errors - Generated files correctly contain Unicode characters (CJK, emoji) --- Co-Authored-By: Claude <noreply@anthropic.com>
|
Thanks for the fix! One issue still seems not fully covered. The standalone console fallback in WorkflowLogger still seems incomplete. utils/logger.py adds a fallback for UnicodeEncodeError, but I was still able to reproduce an uncovered case locally: on a Windows GBK/CP936 console, directly using WorkflowLogger to output text containing emoji or other non-GBK characters can still raise an encoding error. The example I used to reproduce this was: 中文🙂. So the main path launched through server_main.py appears to be fixed, but standalone usage of WorkflowLogger may still have a gap. Please consider improving the WorkflowLogger fallback so it can safely output Unicode content even when used independently from server_main.py. It would also be helpful to add regression tests covering:
|
Problem
On Windows systems with Chinese locale, Python's stdout uses GBK encoding by default. This causes UnicodeEncodeError when:
Error example:
'gbk' codec can't encode character '\U0001f4d6' in position 1189
Changes
server_main.py:
utils/structured_logger.py:
utils/logger.py:
Testing
Verified on Windows 11 with Chinese locale: